Measurement of signal-to-noise ratio in dysphonic voices by image processing of spectrograms

نویسندگان

Maurílio N. Vieira

João Pedro H. Sansão

Hani Yehia

چکیده

The measurement of glottal noise was investigated in human and synthesized dysphonic voices by means of two-dimensional (2D) speech processing. A prime objective was the reduction of measurement sensitivities to fundamental frequency (f0) tracking errors and phonatory aperiodicities. An available fingerprint image enhancement algorithm was used for signal-to-noise measurement in narrow band spectrographic images. This spectrographic signal-to-noise ratio estimation method (SNR) creates binary masks, mainly based on the orientation field of the partials, to separate energy in regions with strong harmonics from energy in noisy areas. Synthesized vowels with additive noise were used to calibrate the algorithm, validate the calibration, and systematically evaluate its dependence on f0, shimmer (cycle-to-cycle amplitude perturbation), and jitter (cycle-to-cycle f0 perturbation). In synthesized voices with known signal-tonoise ratios in the 5–40 dB range, SNR estimates were, on average, accurate within ±3.2 dB and robust to variations in f0 (120 Hz or 220 Hz), jitter (0–3%), and shimmer (0–30%). In human /a/ produced by dysphonic speakers, SNR values and perceptual ratings of breathiness revealed a non-linear but monotonic decay of SNR with increased breathiness. Comparison between SNR and related acoustic measurements indicated similar behaviors regarding the relationship with breathiness and immunity to shimmer, but the other methods had marked influence of jitter. Overall, the SNR method did not rely on accurate f0 estimation, was robust to vocal perturbations and largely independent of vowel type, having also potential application in running speech. 2014 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shearlet-Based Adaptive Noise Reduction in CT Images

The noise in reconstructed slices of X-ray Computed Tomography (CT) is of unknown distribution, non-stationary, oriented and difficult to distinguish from main structural information. This requires the development of special post-processing methods based on the local statistical evaluation of the noise component. This paper presents an adaptive method of reducing noise in CT images employing th...

متن کامل

Biomedical Image Denoising Based on Hybrid Optimization Algorithm and Sequential Filters

Background: Nowadays, image de-noising plays a very important role in medical analysis applications and pre-processing step. Many filters were designed for image processing, assuming a specific noise distribution, so the images which are acquired by different medical imaging modalities must be out of the noise. Objectives: This study has focused on the sequence filters which are selected ...

متن کامل

Grid Impedance Estimation Using Several Short-Term Low Power Signal Injections

In this paper, a signal processing method is proposed to estimate the low and high-frequency impedances of power systems using several short-term low power signal injections for a frequency range of 0-150 kHz. This frequency range is very important, and thusso it is considered in the analysis of power quality issues of smart grids. The impedance estimation is used in many power system applicati...

متن کامل

Processing Digital Image for Measurement of Crack Dimensions in Concrete

The elements of the concrete structure are most frequently affected by cracking. Crack detection is essential to ensure safety and performance during its service life. Cracks do not have a regular shape, in order to achieve the exact dimensions of the crack; the general mathematical formulae are by no means applicable. The authors have proposed a new method which aims to measure the crack dimen...

متن کامل

Synthesis of breathy and rough voices with a view to validating perceptual and automatic glottal cycle pattern recognition

The framework of the presentation is the assessment of the ability of human raters or speechprocessing software to detect glottal cycles in speech sounds and measure their lengths in synthetic breathy and rough voices. The synthesis of hoarse voices designates the generation of speech sounds the timbre of which simulates the voice quality of dysphonic speakers. The added value of synthetically ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Speech Communication

دوره 61 شماره

صفحات -

تاریخ انتشار 2014

Measurement of signal-to-noise ratio in dysphonic voices by image processing of spectrograms

نویسندگان

چکیده

منابع مشابه

Shearlet-Based Adaptive Noise Reduction in CT Images

Biomedical Image Denoising Based on Hybrid Optimization Algorithm and Sequential Filters

Grid Impedance Estimation Using Several Short-Term Low Power Signal Injections

Processing Digital Image for Measurement of Crack Dimensions in Concrete

Synthesis of breathy and rough voices with a view to validating perceptual and automatic glottal cycle pattern recognition

عنوان ژورنال:

اشتراک گذاری